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Abstract. Operator precedence grammars define a classical Boolean and de- 
terministic context-free family (called Floyd languages or FLs). FLs have been 
shown to strictly include the well-known visibly pushdown languages, and enjoy 
the same nice closure properties. We introduce here Floyd automata, an equiva- 
lent operational formalism for defining FLs. This also permits to extend the class 
to deal with infinite strings to perform for instance model checking. 

Keywords: Operator precedence languages. Deterministic Context-Free lan- 
guages. Omega languages. Pushdown automata. 



1 Introduction 

The history of formal language theory has always paired two main and complementary 
formalisms to define and process -not only formal- languages: grammars or syntaxes 
and abstract machines or automata. The power and the complementary benefits of these 
two formalisms are so evident and well-known that it is certainly superfluous to remind 
them here. Also universally known are the conceptual relevance and practical impact 
of the family of context-free languages and the corresponding grammars paired with 
pushdown automata. 

Among the many subfamilies that have been introduced throughout the last decades 
with various goals, operator precedence grammars, herewith renamed Floyd grammars 
(FGs) in honor of their inventor |9|, represent a pioneering model mainly aimed at 
deterministic -and therefore efficient- parsing. Visibly pushdown languages (VPLs) are 
a much more recent subfamily of (deterministic) context-free languages introduced in 
the seminal paper |[T| with the goal of extending the typical closure properties of regular 
languages to larger families of languages accepted by infinite-state machines; a major 
practical result is the possibility of extending such powerful verification technique as 
model checking beyond the scope of finite state machines. Along the usual tradition, 
VPLs have been characterized both in terms of abstract machines, the visibly pushdown 
automata (VPAs), and by means of a suitable subclass of context-free grammars. 



* This is an extended version of the paper which appeared in Proceedings of CSR 2011, 6th 
International Computer Science Symposium in Russia, Lecture Notes in Computer Science, 
vol. 6651, pp. 291-304, 2011. In particular. Theorem [T] has been corrected and a complete 
proof is given in Appendix. 
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Rather surprisingly, instead, investigation of the basic -and nice, indeed- properties 
of FGs has been suspended, probably as a consequence of the advent of other, more 
general, parsing techniques, such as LR parsing 1 10|. Although FGs generate obviously 
a subclass of deterministic CP languages and therefore can be parsed by any determin- 
istic pushdown machine, typically a shift-reduce one ifTOl . we are not aware of a family 
of automata that perfectly matches the generative power of this class of grammars. On 
the other hand, operator precedence parsers are still used today, thanks to their elegant 
simplicity and efficiency. For instance, they are present in Parrot, Perl 6's virtual ma- 
chine, as part of the Parser Grammar Engine (PGE); in GCC's C and C-I-+ hand-coded 
parsers, for managing arithmetic expressions^ 

Quite recently we realized strong relations between these two seemingly unrelated 
families of languages; precisely we showed that: VPLs are a proper subclass of lan- 
guages defined by FGs (i.e. Floyd Languages, or FLs in short), and coincide with those 
languages that can be generated by FGs characterized by a well precise shape of oper- 
ator precedence matrix (OPM). The inclusion relation is effective in that a FG can be 
algorithmically derived form a VPA and conversely a VPA can be obtained by a FG 
whose OPM satisfies the restriction ijS). 

FLs enjoy all typical closure properties of regular languages that motivated the study 
of VPLs and other related families M31 12141 . Precisely, closure w.r.t. Boolean operations 
was proved a long time ago in Q, whereas closure under concatenation, Kleene star, 
and other typical algebraic operations has been investigated only recently under the 
novel interest ignited by the above remark [6J. Thus, the old-fashioned FLs turned out 
to be the largest known class of deterministic context-free languages that enjoy closure 
under all traditional language operations. Another reason why, in our opinion, FLs are 
far from obsolete and uninteresting in these days is that, unlike most other deterministic 
languages of practical use, they can be parsed not necessarily left-to-right, thus offering 
interesting opportunities, e.g., to exploit parallelism and incrementality [ 10|. 

In this paper we provide another missing tile of the "old and new puzzle", namely 
we introduce a novel class of stack-based automata perfectly carved on the generation 
mechanism of FGs, which too we name in honor of Robert Floyd. Not surprisingly they 
inherit some features of VPAs (mainly a clear separation between push and pop opera- 
tions) and maintain some typical behavior of shift-reduce parsing algorithms; however, 
they also exhibit some distinguishing features and imply some non-trivial technicaUties 
to derive them automatically from FGs and conversely. 

The availability of a precise family of automata allows to apply to FLs the now 
familiar w-extension -a further extension of Kleene * operation-, i.e., the definition of 
languages of infinite strings and the various criteria for their acceptance or rejection by 
recognizing devices, w-languages are now more and more important to deal with never- 
ending computations such as operating systems, web-services, embedded applications, 
etc. Thus, we also introduce the w-version of FLs and we show their potential in terms 
of modeling the behavior of some realistic systems. 

The paper is structured as follows: Section |2] recalls basic definitions on Floyd's 
grammars; Section|3]introduces Floyd automata (FAs) and shows that, as well as FSMs 



^ The interested reader may find more information at \http://gcc.gnu.org\ and 
http://www.parrot. org, respectively. 
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and VPAs, but unlike pushdown automata, their deterministic version is not less pow- 
erful than the nondeterministic counterpart; Section |4] provides effective constructions 
to derive a FA from a FG and conversely; Section |5] extends the definition of FLs to 
sets of infinite strings by applying to FAs the well-known concepts of oj-behavior and 
acceptance; finally Section|6]draws some conclusions. 

2 Preliminaries 

Let E be an alphabet. The empty string is denoted e. A context-free (CF) grammar is a 
4-tuple G - (N, E, P, S ), where is the nonterminal alphabet, P the rule (or production) 
set, and S the axiom. An empty rule has e as the right hand side (r.h.s.). A renaming 
rule has one nonterminal as r.h.s. A grammar is reduced if every rule can be used to 
generate some string in E* . It is invertible if no two rules have identical r.h.s. 

The following naming convention will be adopted, unless otherwise specified: low- 
ercase Latin letters a,b, . . . denote terminal characters; uppercase Latin letters A,B, . . . 
denote nonterminal characters; letters u, v, . . . denote terminal strings; and Greek letters 
a, . . . ,cj denote strings over E(J N. The strings may be empty, unless stated otherwise. 

A rule is in operator form if its r.h.s has no adjacent nonterminals; an operator 
grammar (OG) contains just such rules. Any CF grammar admits an equivalent OG, 
which can be also assumed to be invertible 11 1I13II . 

The coming definitions for operator precedence grammars |9|, here renamed Floyd 
Grammars (FG), are from ITJ . We refer the reader unfamiliar with precedence grammars 
and parsing techniques to ifTOll . that contains an easily readable, practical description of 
FGs. 

For an OG G and a nonterminal A, the left and right terminal sets are 

Lc(A) = {fl e 2" I A 4 Baa] ^c(A) ^ {a e E \ A ^ aaB) 

where B e U (e) and denotes the derivation relation. The grammar name G will be 
omitted unless necessary to prevent confusion. 

R. Floyd took inspiration from the traditional notion of precedence between arith- 
metic operators in order to define a broad class of languages, such that the shape of 
the derivation tree is solely determined by a binary relation between terminals that are 
consecutive, or become consecutive after a bottom-up reduction step. 

For an OG G, let a, (3 range over (A^ U E)* and a,b e E. Three binary operator 
precedence (OP) relations are defined: 

equal in precedence: a ^ b <^=^ 3A — > aaBbfi, B e N U {s} 

takes precedence: a>b 3A aDb/3, D e N and a e JlciD) 

yields precedence: a <b 3A aaD/3, D e N and b e Lc{D) 

For an OG G, the operator precedence matrix (OPM) M - OPM(G) is a li^l x 12*1 array 
that with each ordered pair (a, b) associates the set Mah of OP relations holding between 
a and b. 

Deflnition 1. G is an operator precedence or Floyd grammar (FG) if and only if M = 
OPM(G) is a conflict-free matrix, i.e., Sa,b, \Mab\ < 1- 
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Example 1. Arithmetic expressions with prioritized operators, a classical construct, are 
presented in a simple variant without parentheses. Figure [T] presents the productions of 
the grammar (left) and the derivation tree of expression n + nxn (center). We see that 
X = n because they appear in the right-hand side of the same production. Analogously, 
+ < n since + is sibling of a node with label T and n € LciT)- The complete OPM is 
shown in Figure [U (right). 
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Fig. 1. The Floyd grammar for arithmetic expressions without parentheses. 



The equal in precedence relations of a FG alphabet are connected with an impor- 
tant parameter of the grammar, namely the length of the right hand sides of the rules. 
Clearly, a rule A —> AiOi . . .A,a,A,+i, where each A, is a possibly missing nonterminal, 
is associated with relations 01=02= ■ ■ • If the = relation is cyclic, there is no finite 
bound on the length of the r.h.s of a production. Otherwise the length is bounded by 
2 ■ c + I, where c > 1 is the length of the longest =-chain. In this paper, for the sake of 
simpUcity and brevity we assume that aU precedence matrices are =-cycle free. In the 
case of FGs this prevents the risk of rh.s of unbounded length |T|, in the case of FAs 
we will see that it avoids a priori the risk of an unbounded sequence of push operations 
onto the stack matched by only one pop operation. The hypothesis of =-cycle freedom 
could be replaced by weaker ones, such as a bound on rh.s, as it happens with FGs, at 
the price of heavier notation, constructions, and proofs. 

Definition 2. A FG is in Fischer normal form [8] if it is invertible, the axiom S does 
not occur in the r.h.s. of any rule, no empty rule exists except possibly S ^ s, the other 
rules having S as l.h.s are renaming, and no other renaming rules exist. 

OPMs play a fundamental role in deterministic parsing of FGs. Thus in the view of 
defining automata to parse FLs we pair them with the alphabet somewhat mimicking 
VPL's approach where the terminal alphabet is partitioned into calls, returns, and inter- 
nals To this goal, we use a special symbol # not in E to mark the beginning and 
the end of any string. This is consistent with the typical operator parsing technique that 
requires the lookback and lookahead of one character to determine the precedence rela- 
tion ifTOl . The precedence relation in the OPM are extended to include # in the normal 
way. 

Definition 3. An operator precedence alphabet is a pair iS, M) where E is an alphabet 
and M is a conflict-free operator precedence matrix, i.e. a \E ^ array that with 
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each ordered pair {a, b) associates at most one of the operator precedence relations: =, 
< or >. 

For u,v e Z* we write u <v if u = xa and v = by with a<b. Similarly for the other 
precedence relations. 



3 Floyd automata 

Definition 4. A nondeterministic precedence automaton ( or Floyd automaton ) is given 
by a tuple: A = {E, M, Q, I, F, 6) where: 

- {E, M) is a precedence alphabet, 

- Q is a set of states (disjoint from E), 

- I c Q is a set of initial states, 

- F Q Q is a set of final states, 

- 6 : Qx(EL) Q) 2^ is the transition function. 

The transition function can be seen as the union of two disjoint functions: 

dp^sb-.QxE^lQ (Jflush : e X e ^ 22 

A nondeterministic precedence automaton can be represented by a graph with Q as the 
set of vertices and E U Qas the set of edge labellings: there is an edge from state q to 
state p labelled by a e if and only if p E dpushiq, a) and there is an edge from state q 
to state p labelled by r e 2 if and only if p g Sfiushiq, r). To distinguish flush transitions 
from push transitions we denote the former ones by a double arrow. 

To define the semantics of the automaton, we introduce some notations. We use 
letters p,q,pi,qi, . . . for states in Q and we set E' - {a' \ a € E]; symbols in E' 
are called marked symbols. Let F = (E U E' U {#}) x Q; we denote symbols in F 
as [a q], [a' q], or [# q], respectively. We set symbol([a q]) - symbol([a' q]) = a, 
symbolil# q]) - #, and stateHa q]) = stateda' q]) = stateil# q]) = q. Given a string 
yS = B1B2 ■ Bn with B, e F, we set state(J3) = state{B„). 

We call a configuration any pair C - (fi , w), where p - 8182- -.Bn e F*, 
symbol{Bi) = #, and w = a\a2...am e E*#. A configuration represents both the 
contents yS of the stack and the part of input w still to process. We also set top{C) = 
symbol(B„) and input(C) = a\. 

A computation of the automaton is a finite sequence of moves C h Ci ; there are three 
kinds of moves, depending on the precedence relation between top{C) and inputiC): 

push move: 

if topiC) = input{C) then (fi , aw) h (fi[a q] , w), e 6push{state(fi),a); 
mark move: 

if top(C) < input(C) then (fi , aw) h <fi[a' q] , w), 'iq e 6pushistate(/3),a); 
flush move: 
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if topiC) > input(C) then let = B1B2 ■ ■ ■ with Bj = [xj qj], xj e I (J T md let 
i the greatest index such that fi, belongs to U' x Q. Then 

(J3 , aw) h {B1B2 . ..Bi-2[xi-i ], aw), Sq e 5fh,sh{qn,qi-\)- 

Push and mark moves both push the input symbol on the top of the stack, together 
with the new state computed by 6 push', such moves differ only in the marking of the 
symbol on top of the stack. The flush move is more complex: the symbols on the top of 
the stack are removed until the first marked symbol {included), and the state of the next 
symbol below them in the stack is updated by SfUish according to the pair of states that 
delimit the portion of the stack to be removed; notice that in this move the input symbol 
is not relevant and it remains available for the following move. 

Finally, we say that a configuration [# qi\ is starting \f qi e I and a configuration 
[# qp] is accepting if qp € F. The language accepted by the automaton is defined as: 

L(A) = [x I ([# qi] , x#) h <[# qp] , #), q, € /, e . 

Example 2. The automaton depicted in Figure |2] accepts the Dyck language Ld of bal- 
anced strings of parentheses, with two parentheses pairs a, a, and b,b_. The same figure 
also shows an accepting computation on input aba aba aa. 
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Fig. 2. Automaton, precedence matrix, and example of computation for language Lo- 



A Floyd automaton is called deterministic when i5push(?, a) and 5flush(^, p) have at 
most one element, for every q,p e Q and a e £, and / is a singleton. Here we prove that 
deterministic Floyd automata are equivalent to nondeterministic ones, with a power-set 
construction similar to the one used for classical finite state automata. 

Theorem 1. Deterministic Floyd automata are equivalent to nondeterministic ones. 

Given a nondeterministic automaton A - {S, M, Q, I, F, S), consider the deterministic 
automaton A = (2", M, Q, I, F, S) where: 
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- Q= I'x 2ex(eu{±))^ where 1' = (i:u {#)), 2 n {±) = 0, and ± is a symbol that stands 
for the baseline of the computations (i.e. the pseudo-state before the initial states), 

- / = <#,/ X {±}) is the initial state of A, 

- F is the set of pairs <#, ^> such that there exists q e F with (q, ±) e K. 

- 6: QxiXDQ) — > g is the transition function defined as follows. The push transition 
^push : (2 X — > 2 is defined by 

6^ush({b,K},a) = la, [J l(h,t) \ h € 6pushiq,a) and t ^ 

The flush transition Sfiush : QxQ —¥ Qis defined as follows: 

Sftashiib, Ki),{a,K2)) = la, [J {(h, p)\h€ JflushC'-, q)} 

\ (r.g)€Ku(q,p)eK2 

The proof of the equivalence between A and A is given in Appendix. 

4 Floyd automata vs Floyd grammars 

The main result of this paper is the perfect match between FGs and FAs. 

4.1 From Floyd grammars to Floyd automata 

Theorem 2. Any L generated by a Floyd grammar can be recognized by a Floyd au- 
tomaton 

We provide a constructive proof of the theorem: given a Floyd grammar G we build 
an equivalent nondeterministic Floyd automaton A - {S, M, Q, I, F, 6), whose prece- 
dence matrix M is the same as the one associated with G. A successful computation of 
A will correspond to a derivation tree in G: intuitively, a push transition tries to guess 
the parent of the symbol currently under the input head (i.e. it determines the l.h.s of 
a rule of G whose r.h.s contains the current symbol); a flush transition is performed 
whenever the r.h.s of a rule is completed, and determines the corresponding l.h.s., thus 
confirming some previous guesses. 

In order to keep the construction as simple as possible, we avoid introducing any op- 
timization. Also, without loss of generality, we assume that the grammar G = {X, N,P,S) 
satisfies the following properties: the axiom S does not occur in the r.h.s. of any rule, no 
empty rule exists except possibly 5 — > e, the other rules having S as l.h.s are renaming, 
and no other renaming rules exist (in other words, we assume that the G is in Fischer 
normal form except it is not necessarily invertible). 

First of all, we introduce some notation. Enumerate the productions as follows: 
for any nonterminal A € N, let Pi (A), /'2(A), . . .P„(a)(A) be the productions having 
A as l.h.s. (i.e. n(A) is the number of productions having A as l.h.s.). Then, consider 
the set of extended nonterminals EN = {A, \ A & N,i = 1,2, ...«(A)) and define 
Q = EN X {EN U {±}), where ± is a new symbol whose meaning is undefined. To 



q if b < a 1 \ 
piib = a j 
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distinguish between nonterminals and extended nonterminals, we will use capital letters 
A,B,C, . . . and X,Y,Z, . . ., respectively. 

When considering derivation trees of G, we label internal nodes with extended non- 
terminals (where the subscript of the nonterminal corresponds to the rule applied in the 
node). Moreover, with a slight abuse of notation, we sometimes confuse nodes and their 
labels, using the above convention also for internal nodes and leaves. 

To define the push transition function 6pusii ■ Qx ^ —> 2^, consider any derivation 
tree t of G with any leaf a and let X be a's parent in t. Figure ^represents the various 
configurations that t may exhibit. 

- Case 0: if there is no leaf that precedes a in the in-order visit of t and has depth not 
greater than a's depth, then let Y be the topmost ancestor of X, i.e., Y - S i for some 

this also means that # <a; 

- Otherwise, let b be the rightmost such leaf, and let Y be y's parent. Notice that, 
G being an operator grammar, Y is the nearest common ancestor of a and b. Then 
there are two possibilities: 

• Case \:X -Y, i.e. b = a; 

• Case 2: X Y, and in this case b has lower depth than a,so b < a. 

In all cases, node Z may be missing, or there may be other leaves between b and a 
(namely, Z's descendants); let Z =± if Z is missing, Z - Z otherwise. Then, for each 
such triple (a, X, Y), define the (a, X, Y)-push transition: 

, f (X, X) if fl is the rightmost child of X, 
^"""«^'^)'«^^{(X,±) otherwise. 

Hence, a push transition essentially determines the parent of the symbol under the input 
head (actually, a "candidate" parent, since the automaton is non-deterministic). 




Z a ■■■ ■■■ b Z a ■■■ Z a 



Case Case 1 Case 2 



Fig. 3. Derivation tree configurations for the push transition function (nodes labelled as 
. . . could be missing). 



A similar construction holds for the flush transition function (S//,,.,/, Qy. Q ^ 2^. 
For every derivation tree with internal node X, let / and £ be the first and last child. 
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respectively, of node X. Notice that both / and { may be either internal nodes or leaves. 
Then there are two possibilities, as depicted in Figure IH 

- Case 3: there is no leaf at the left of X, then let Y be the topmost ancestor of X, i.e., 
Y - S i for some /; 

- Case 4: otherwise, let b be the rightmost leaf at the left of X and let Y be b's parent 
(again, notice that Y is the nearest common ancestor of X and b, G being an operator 
grammar). 

Also, let {pi^ htCift is an internal node, X otherwise; let / be / if / is an internal node, 
± otherwise. Then, for each such pair (X, Y) define the {X, Y)-flush transition: 

dfiusk{{XJpdAYj))3{Y,X). 

Hence, the state computed by a flush transition contains two pieces of information: the 
first component determines the nearest ancestor of both X and b (or the axioms if b 
does not exist), while the second component determines the nonterminal corresponding 
to the r.h.s. just completed. 




Case 3 Case 4 

Fig. 4. Derivation tree configurations for the flush transition function (all nodes marked 
as . . . could be missing). 



Finally, initial and final states are defined as follows. 

/ = {{Si, ±) I 1 < / < n{S)], F = {{Si,Aj) \S ^AeP,l<i<n{S),l<i< n{A)}. 

Notice that the above construction is effective. All triples (a, X, Y) involved by some 
push transition can be found starting from any rule X — > a with a containing a: if a is 
not the leftmost terminal of a, then take the triple {a,X,X), else apply backwards any 
rule with r.h.s starting with X and extend this process until all productions have been 
examined. Similarly for the flush transitions. 

Example 3. Let G be the grammar introduced in Example [T] Following the above con- 
struction, number the rules of the grammar in the order they appear in the definition of 
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G (for instance, PiiE) is E ^ T x a). The transitions defined by the derivation tree of 
string ax a + a, depicted in Figure|5](left), are the following: 



6puMSu±),a)3(T2,T2) 

SpushiiSuTl), X)B(E2,±) 

6p,sh((SuE2),+)3(Eu±) 
SpushiiEi, -L), a) 3 (E2, £2) 
6p,A(Eu±),a)B(T2,T2) 



6f,ush((T2,T2),(Eu ±)) B (EuT2) 
Sflush((T2,T2),(S uT2)) B {S uT2) 
6fu,,,h((E2,E2),(SuT2)) B (SuE2) 

6fu,,,h{{EuT2),{SuE2))B(SuEi) 



The first one is the (a, T2, 5 i)-push transition obtained by starting from the left-most 
leaf (Case 0). Case occurs also for the second and third push transitions, obtained 
considering the leaves labeled by x and +, respectively. The other push transitions rep- 
resent instances of Cases 1 and 2, in this order. As far as flush transitions are concerned, 
Case 4 occurs only in the first stated transition, with X - T2, b = + and Y - E^, 
whereas all other productions represent instances of Case 3. Hence, on input axa + a, 
the automaton A obtained from G may execute the computation represented in Figure|5] 
(right). 
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<[#(5'i,±)] ,axa-Ha#> 

mai-k <[#(5i,±)][a' (72,72)] , x a + a #) 

flush <[#(S 1,72)] , xa + a#) 

mark<[#(5i,r2)][x'(£'2,-L)] , a + a #) 

push <[#(5,,r2)][x'(£2,-L)][a(£2,£2)] , 

flush ([#(5 1,^2)] , 

mark<[#(5i,£:2)][+'(£i,-L)] , a #) 

mark <[# (5i,£2)][+' (£1, -L)][fl' {Ti, Ti)\ , #) 

flush <[#(5,,£2)][+' (£1,^2)] , #) 

flush <[#(5i,£i)] , #) 



Fig. 5. Derivation tree (left) and computation (right) for the string axa + a. 



The equivalence between G and the automaton described above is based on the 
following lemma, whose proof is omitted because of space reasons. As usual we set 
r = (2" U 2"') X 2 = (2" U 2"') X (EN x (EN U {±})) and we denote an element in 
r as [a (X,Y)]. To avoid an excessively cumbersome notation, when describing the 
transitions between configurations, we omit the extreme parts (i.e. the lower part of the 
stack and a suffix of the input string) which are not affected by the computation. 

We define the depth of a computation Ci \- C2 as the maximum number of marked 
symbols in one of the traversed configurations, minus the number of marked symbol on 
the stack in configuration Ci ; we define the depth of a derivation W ^ a as the depth of 
the corresponding derivation tree. When useful, we make the depth /z of a computation 

[h] [h] 

or a derivation explicit as in Ci 1- C2 and X ^ a. 
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Lemma 1. Let Y, W be extended nonterminals ofG,v€ S*, a <v > b, and a 6 {a, a'}. 
Then for all h > I: 

{[a (Y, ±)] , vb) h' {[a (F, W)] , b) iff 3a, /3 such that Y -> aaW/3, W U v in G. 
Proof. The lemma is equivalent to the following two properties. 

(i) For every Y,X, a < c <x > d, A admits the computation 

{[a (Y, ±)] , cxd) 'h {{a {Y,X)\ , d) 

if and only if there exist 'W,a,[i,y,e such that Y — > aaWp, W => Xy, X 
ce, e X. 

(ii) For every Y,X,Z, a < d <z> e, A admits the computation 

{{a {Y,X)] , dze) h' {{a {Y,Z)] , d) 

* 

if and only if there exist W,a,/3,fi,A such that Y — > aaWfi, W ^ Zfi, Z — > 
W 

XdA, A^z. 



Y 

a' a W P a' a W ^ P 

Y X 7 Z 1^ 

\\ / \ / ^ ■ 

a'aWyS c e XdA 

V X z 

Statement of Lemma Property ^ Property dnjl 



Notice that in IV and X may coincide (i.e., y may be empty), and in (|n]l and Z 
may coincide (i.e., ju may be empty). For /i = 1, the lemma is given by property ^ with 
W - X and k = (for cx - v,d - b); for /i > 1 we have v - cxd\Z\d2 ■ ■ ■ d„z„ for some 
c <x> d[, di <Zi > di+i (with x, Zi possibly empty). Then, applying first property (Qi and 
then, repeatedly, property one gets the lemma. 

We prove property ^ reasoning by induction on k. Fkst let A; = 0; in this case 
e - X, i.e. X — » cx. Hence, if .jc = c\ . . . c„, during the computation defined in A 
has to execute the following series of moves: a marked (c, Xq, y)-push transition (case 
2 without Z), then a sequence of (c,, Xo,Xo)-push transitions (case 1 without Z), and 
finally a (Xq, y)-flush transition, for a suitable Xq: 



{T][d (F, ±)][c' {Xo, ±)][ci (Xo, ±)]... [cn {Xo,Xo)] , d) h {T][a iY,Xo)] , d). 
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To end in the right configuration, we necessarily have Xq - X. Moreover, by the defi- 
nition of transitions in A, X must satisfy exactly the relations defined in Vice versa, 
if the grammar admits the derivation defined in then obviously the automaton A 
admits the previous moves. 

One can prove similarly property ^ for A: = 0: in this case, both the marked 
(d, Z, F)-push transition and the (Z, y)-flush transition involve the extended nonterminal 
X (i.e., the second component of the state on the top of the stack). 

Now, assuming that properties ^ and ^ hold for depths lower than k, we prove 
them for A:. First consider ^ and let x - uqC\U\C2 ■ ■ ■ c„,u,„ with c <(<(>, m,_i >Cj<Ui (with 
any m,- possibly empty), and c, = c,+i. By the definition of the transition function, A 
admits the computation in ^ if and only if there exist W, a,/3, y, e as in and moreover 

[k,] 

there exist Uo, ■ ■ ■ U,„ such that e = UoCiUi . . .CmUm and t/,- ^ m/ with kj < k (Ui is 
missing iff" m, is empty). Hence one can apply the inductive hypothesis and get the result. 

One can prove similarly property ^ for k greater than 0: again, in this case, both 
the marked (d, Z, y)-push transition and the (Z, y)-flush transition involve the extended 
nonterminal X. □ 

From the lemma the theorem easily follows by using a special case S A (with 
implicit # as a and b). 

4.2 From Floyd automata to Floyd grammars 

Given a Floyd automaton A - {E, M, Q, I, F, 6), we show how to build an equivalent 
Floyd grammar G having operator precedence matrix M. In order to keep the con- 
struction as easy as possible, w.l.o.g we assume that M is =-acyclic. Remind that, as 
discussed in Section|2] this hypothesis could be replaced by weaker ones. 

We need some notation and definitions. First of all, we shall represent a push transi- 
tion with a simple arrow — >, a flush transition with a double arrow and a path defined 
by a sequence of transitions with a wavy arrow 

We define chains in A recursively. A simple chain is a word aQa\a2 . . . a„a„+\, writ- 
ten as {"°aia2 . . .a„""*'}, such that: a(),a„+i e £ U {#}, € U for every / = 1,2, .. .n, 
Mao,a„+i ^ 01 and flo < ai = 02 ■ ■ -fln-i — a„ > a„+i. A composed chain in ^1 is a 
word where (""aiaa ■ ■ is a simple chain, and x,- € 2"* 

is the empty word or is such that ("'x, is a chain (simple or composed), for every 
! = 0, 1, . . . , n - 1. Such a composed chain will be written as {'''>xoaiXia2 ■ ■ ■ a„x„""+' ). 

We call a support for the simple chain {"°aia2 ■ ■ ■ ) any path in A of the form 



Notice that the label of the last (and only) flush is exactly ^0, i e. the first state of the 
path; this flush is executed because of relation a„ > a„+\. We call a support for the 
composed chain {"° XQaiXia2 ■ ■ ■ anX„""*^ ) any path in A of the form 





qn+i 



(1) 



qo ^ qo 



qi^qi^ ... 




(2) 



where, for every ; = 0, 1 , . . . , n; 



Precedence Automata and Languages 13 



A'; 

- if Xi + e, then ^, is a support for the chain ("'x,"'*' ), i.e., it can be decomposed 

// * , 
as qi q. ^ q.. 

- if Xi - e, then q'. - qi. 

Notice that the label of the last flush is exactly q[y 

We are now able to define a Floyd grammar G - {E, N,S,P). Nonterminals are the 
4-tuples (a,q,p,b) eXxQxQxX, written as {"p, (f), plus the axiom S . Rules are 
built as follows: 

- for every support of type (HJ of a simple chain, add the rule 

<"°?o, qn+\"*' ) — > flifl2 ■ ■ ■ fl« ; 

if also flo - a„+i - #, qo is initial, and q„+i is final, add the rule S {*^qo, qn+\*)\ 

- for every support of type (|2|l of a composed chain, add the rule 

("°?o, ?„+i"""' ) M)fliA^ifl2 • ■ ■ a„N„ ; 

where, for every i - 0,1, ... ,n, Ni - {"'qi, q'"'*' ) if x,- 9^ e and A^, - e otherwise. 

Notice that the above construction is effective thanks to the hypothesis of =-acyclicity 
of the OPM. This implies that the length of the rh.s. is bounded (see Section|2ll; on the 
other hand, the cardinality of the nonterminal alphabet is finite. Hence there is only a 
finite number of possible productions for G and only a limited number of chains to be 
considered. 

5 a»-languages 

Having an operational model that defines Floyd Languages, it is now straightforward to 
introduce extensions to (^-languages. 

For instance, the classical Biichi condition of acceptance can be easily adapted to 
FAs. Consider an infinite word x e 2''", and an infinite computation of the automaton 
Am — {E, M, Q, I, F, 6} on x, i.e. an w-sequence of configurations S = (^So , xo}(fii , xi) . . ., 
such that {/3o , xq) = ([# qi] , x), qi e I and (Jii , Xi) h {j3i+\ , ). We say that x e L{A) 
if and only if there exists qf e F such that configurations with stack [# qf] occur in- 
finitely often in S. 

Quite naturally, w- VPLs are a proper subset of this class of languages, as it is shown 
by the following example. 

Example 4. We define here the stack management of a simple programming language 
that is able to handle nested exceptions. For simplicity, there are only two procedures, 
called a and b. Calls and returns are denoted by calla, callh, reta, retf,, respectively. 
During execution, it is possible to install an exception handler hnd. The last signal that 
we use is rst, that is issued when an exception occur, or after a correct execution to 
uninstall the handler. With a rst the stack is "flushed", restoring the state right before 
the last hnd. The automaton is presented in Figure |6] (notice that it is an extension of the 
automaton in Figure|2]i. It is easy to modify this example to model the case of unnested 
exceptions, to fit with other application contexts. 
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calla 
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> > 


calli, 
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< > 


retb 
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> > 


hnd 


< 
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rst 
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> 




# 


< 




< 




< 



calla, callb,hnd 




calla, reta, call],, rett, hnd, rst 



Fig. 6. Precedence matrix and automaton for an w-language. There is no column in- 
dexed by # since words are infinite. 



6 Conclusions and further research 

Recently, we advocated that operator precedence grammars and languages, here re- 
named after their inventor Robert Floyd, deserve renewed attention in the realm of 
formal languages. The main reasons to support our claim are: 

- The fact that this family of languages properly includes visibly pushdown lan- 
guages 12J, a new family that has been proposed with the main motivation of ex- 
tending powerful model checking techniques beyond the limits of finite state ma- 
chines. 

- The fact that it enjoys all closure properties with respect to the main algebraic 
operations that are exhibited by regular languages and VPLs. 

- The fact that, unlike other deterministic languages -either strictly more powerful 
than them, or incomparable with them- such as LR, LL, and simple precedence 
ones, FLs can be parsed without applying a strictly left-to-right order; this feature 
becomes particularly relevant in these days since it allows to exploit much better 
the gains in efficiency offered by massive paralleUsm. 

In this paper we filled a rather surprising "hole" in the theory of these languages, namely 
the lack of an appropriate family of automata that perfectly matches the generative 
power of their grammars. We defined FAs with such a goal in mind and we proved their 
equivalence with FGs. Both facts turned out to be non-trivial jobs and showed further 
interesting peculiarities of this pioneering family of deterministic languages. A first 
"byproduct" of the new automata family is the extension of FLs to w-languages, i.e., 
languages consisting of infinite strings, a more and more important aspect of formal 
language theory needed to deal with never ending computations. In this case too FL 
w-languages proved to augment the descriptive capabilities of the original VPLs. 

As a first step towards applicability of the results presented in this paper, and also 
to validate our approach with several practical examples, we implemented a simple 
prototypical tool, called Flup. Flup contains an interpreter for non-deterministic Floyd 
Automata, and a Floyd Grammar to Automata translator, that directly applies the con- 
struction presented in Section 14.11 All the examples presented in the paper were tried 
on, or generated by the tool0 

The prototype is freely available at ^http://home. dei.polimi. it/pradella\ 
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We are confident that suitable future research will further strengthen the importance 
of, and motivation for, re-inserting FLs in the main stream of formal language hterature. 
In particular it would be interesting to complete the parallel analysis and comparison 
with VPLs by investigating a characterization in terms of suitable logic formulas Q; 
by this way motivation for, and application of, strong model checking techniques would 
be further enhanced. 

Acknowledgement. We thank Federica Panella for her comments and suggestions, es- 
pecially with respect to the construction of Theoremfefth:nondet and w-languages. 
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Appendix: proof of Theorem [T] 

Notation. We use J, J, J' , J, . . . to denote states in Q and K, K, K' , Ki, ... to denote 
set of pairs in Q x (QU {±}). We use arrows — > and to denote push and flush 
transitions, respectively, both in A and in A. 

Remarks 

- {a,K) 

i) By the definition of (5flush, if {b,K) {a,K') in A and (q',p) e K', then there 

9 

exists a pair (r, q) & K such that {q, p) & K and r => q' in A. 

ii) By the definition of 5push, if {b, K) — > (a, K) in .A, (r, ^) e K, and b ^ a, then there 
exists a state q e Q such that ^ — > r in yi and (q, q) E K. 

iii) By the definition of (Jpush, if ^) — > (a, ^) in A, (q, q) € /T, {q, p) e K, and 
b <a, then ^ — > q in ^1. 

Lemma 2. Let C = ("y*) a chain and let {a, K) {a' , K') be a support for 6 in A. 
Then a' - a and the support has the form 

{a, K) ^ (a, ko) (ai,Ki) ^ (a^, Kt) ^ . . . ^ {a„,K„) ^ {a„, K„) ^"=^^ (a, K') (3) 

where y — xoa\X\a2 . . . a«x„ and {°a\a2 . . . a,l') is a simple chain. Any word Xj may be 
empty and in this case Ki — Ki. 

Proof. We argue by induction on the number of flush transitions in the support. If there 
is only one flush transition, then the chain is simple, i.e. y - aia2. . .a,, with a < ai = 
a2 - ■ ■ ■ - a,, > b, and by the definition of 5push, the support can be rewritten as 

{a,K)^{auKi)^ ..."^ {a„^uK„-i) ^ {a„,K„) {a',K') (4) 

By the definition of duush, we get a' - a. 

Now assume that the statement holds for supports with k flush transitions at most. 
Let y = xoaiXia2 . ■ .a„x„, where {"a\a2 ■ . .«„'') is a simple chain, and consider the 
support 

A"n - a\ x-i - ai a,. x,, - Jo , , 

{a,K)^Jo^{auKi}-^Ji . . . ^ {a„, K„) ^ J„ ^ {a' , K') 

where, for every i = 0,1,2, .. .n, the support labeled by x,- contains k flush transitions at 
most. The inductive hypothesis implies that /, has the form (a,-, Ki) for some Ki (where 
ao - a). In particular the state Jo has the form {a, Kq) hence, by the definition of 5flush, 
we have a' - a. □ 

, V 

Lemma 3. Let C = ("y ) be a chain and q q' be a support for G in A. Then, for 
every p € Q and K Q Q X (Q D {±]), if K 3 {q, p), there exists a support 

{a,K) {a,K') 

for e in A with K' 3 {q',p). 
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Proof. We argue by induction on the number of flush transitions contained in the sup- 

V 

port q-^ q' .If there is only one flush transition, then y - a\a2 ■ ■ .an with a<a\ =02 - 
■ ■ ■ ^ a„ > b and the support can be rewritten as 

q^qo — > q\ — > • ■ ■ — > qn-i — ^ qn^ q 

Set Kq - K,ao- a, and 

<fl;, Ki) - 5push«a/-i , Ki-i), ai), for every / = \,2,...,n 
(a, K'} = 5flush((fl«, Kn), (a, K}) 



Then 



{a,K) — > (auKi) . . . — > {a„-i,K„-i) — > {a„,K„) ^ {a,K) 



is a support for 6 in A. Moreover, since K 3 {q, p), by the definition of 6 we have: 

Ki 3 (qi,q) since a < ai and 6push(q,oi) 3 qi 

Ki 3 (q,, q) since fl,_i = a,- and <5push(^/-i, fl;) 9 ?i 

^T' 3 iq' , p) since a„ > b and (5flush(^n, 3 ^' 

Now assume that the statement holds for supports with k flush transitions at most. 
Let y = xofliJCiCi2 • ■ - cinXn, where {"a\a2 ■ . .aj') is a simple chain, and consider the 
support 

A'o _ a\ x\ _ a-' a„ x,, _ . 

q^qo-^ qo — > qi-^ qi — > • ■ • — > qn q,, ^ q 

where q, - q, whenever jc, is the empty word and, for every / = 0, 1 , 2, . . . n, the support 
labeled by jc, contains k flush transitions at most. 
Set J() - {a, K) and 

/, = 6{ Ji, xf) for every / = 0, 1 , . . . , n 

Ji = 5push(/i-i , a,) for every / = 1 , 2, . . . , n 

J' = ^flush(-/n,-/o)- 

Then 

A'n - ai x\ - a,. x, - Jo , 

{a,K)^ Ja—^ Ji^ Ji—^ ...^ J„ J„ ^ r (5) 

is a support of 6 in A. By Lemma |2] there exist Ki,Ki,K' such that /, = {ai,Ki}, 
Ji = (flj, ^T,), and J' - (a, /T'), where ao - a, i.e., the support is (|3]l. 
Moreover, since K 3 (q, p), by the definition of S we have: 

Kq 3 (qo, p) by inductive hypothesis on the support q = qo ^ qo 

Ki 3 (qi,qo) since aq < fli and 5push(^o, oi) 3 ?i 

^1 3 (qi,qo) by inductive hypothesis on the support qi qi 

Ki 3 {qi,qo) since fl,_i = a,- and 5pushfe-i,fl/) 9 g'i 

- _ _ -^i 
Ki 3 {qi, qo) by inductive hypothesis on the support qi qi 

K' 3 (q',p) since (5flush(^«, ^0) ^ q' 



□ 
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Lemma 4. Let 6 = Cy*) be a chain and {a,K) {a,K') be a support for C in A. 

y 

Then, for every p, q' e Q, if K' 3 (q',p) there exists a support q q' for C in A with 
(q,p)eK. 

Proof. We argue by induction on the number of flush transitions contained in the sup- 
port {a,K) ^ {a,K'). If there is only one flush transition, then y - a\a2 . . .fl„ with 
Go <fli - fl2 - ■ • ■ - fl« >a«+i and the support can be rewritten as in (|4]i. Let K' 3 {q',py, 
then, by remark dOl there exists a pair (q„, q) 6 K„ such that {q, p) € K and q^ => q' in 
A. Moreover, (^„, q) € Kn, <a„_i, /r„_i) — ^ (a„, K„) and imply by remark ^ 

the existence of a state q„^i e Q such that {qn-\,q) e A'„_i and g'„_i — > Similarly 
one can verify that for every i - n-2, . . A there exists qi e Q such that (^,, ^) e Kj and 
0',- — > g'i+i. In particular, <fl,^r> — > {ai,K\), (qi,q) e /Ti, e /T, and a <ai imply 
by Remark dinl i that q — U in A. Thus, we built backward a path 

(7] fifT ^11 1 , 

q — > qi ^ qi ■■■ — ^ qn^ q 

with iq, p) 6 K, and this concludes the proof of induction basis. 

Now assume that the statement holds for supports with k flush transitions at most. 
Lety - X()aixia2 ■ ■ ■ a„x„, where ("oiaa • ■ • fln*) is a simple chain, and consider a support 
of the form 

Xo - a] A"i - fl2 f^n X„ - Jo 

{a, K) ^ Jo — > Ji J\ . . . — > J„ Jn => (fl, K ) 

where ^, - qi whenever jc, is the empty word and, for every / = 0, 1 , 2, . . . «, the support 
labeled by Xj contains k flush transitions at most. Then by Lemma|2]the support can be 
rewritten as in (O. 

(a,Ko} — 

Let (q',p) e K'. Since {a„,K„} => {a,K'), by Remark dijl there exists a pair 

__ - _ -_9o 

(qn,qo) 6 Kn with (qo,p) e Kq and q„ q' in A. By the inductive hypothesis, since 
_ _ - x„ _ 

iqn, qo) G K„ there exists a support q„ q„ with {q„, qo) e K„. 

Similarly one can see that, for all / = n - 1, ... 2, 1, there exist qi and qj such that 

Xi _ a,+i 

qi q-i — > q-t+i 

with (qi,qo) e A',- by Remark ^ (since {ai,Ki) /Tj+i) in ^4,, iqi+\,qo) e 

_ -'■"i - 

and fl, < fl,+i), and (g',, ^o) ^ by the inductive hypothesis (since (a,-, /T,) -^^ (a,-, /T,) in 

^ and (qi, q^) e A',). 

In particular ^ qi with (qi,qo) e ^T]. Then, since also {a,Ko} — > (fli,/ri), 
(qo,p) G ^0, and a < a\, by Remark mm we get — > Finally, since (qo,p) e ATq 
and (a, /T) (a, A'o), the inductive hypothesis implies the existence of a state q & Q 
such that q qo in 71 with {q, p) e /T. Hence we built a support 

q-^ qo — ^ qi-^ q\ — > ■ • ■ — > qn-\ ^ qn-i — > qn^q 



with (q, p) & K and this concludes the proof. 



□ 
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We are now ready to prove Theorem[T] i.e., we prove that there exists an accepting 
computation for y in ^1 if and only if there exists an accepting computation for yinA. 
Let c be an accepting computation for y in A. Then for K = / x {±) 3 {qo, ±) 

Lemma |4]impHes the existence of a support 7 = (#, K) (#, K'} for yinA with K' B 
iq', ±). q' € F implies (#, K'} e F, hence the support defines an accepting computation 
for y in A. 

Vice versa, let c be an accepting computation for y in A. Then (*3'*) is a chain that 
admits a support 7 7' in A, with J' e F. This means that there exists q' e F such 
that {#,q',±) e J'. Hence, by Lemma |3] there exists a support q q' in A with 
(#, q, ±} € /, and this implies q e I. Thus the support q q' defines an accepting 
computation for yinA. □ 



